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ABSTRACT 


Classification and identifying important features from biological datasets has become a crucial problem due to their high 
dimensionality. Hence, we propose a hybrid feature selection technique, EO-SCA, as a novel wrapper-based feature selection 
technique to overcome these problems. Equilibrium Optimizer is an efficient optimization model based on mass balance models. 
SCA is hybridized with the EO approach to improve the particles’ ability to explore and their search ability. The performance of 
the suggested model in comparison to the other algorithms is demonstrated by the performance results of the EO-SCA algorithm 
on 20 popular medical datasets. Furthermore, the outcomes of tests conducted on 20 medical datasets demonstrate the efficacy of 
the suggested algorithm, EO-SCA, in terms of accurate classification and selective features. 
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1. INTRODUCTION is categorised into four types: filter, wrapper, embedded, 


The ability to accurately identify the essential 
characteristics of medical data that can help with the 
diagnosis of linked disorders is a major factor in the 
classification of that data. Feature selection techniques 
that aim to eliminate insignificant features in order to 
improve classification accuracy can help accomplish this 
goal. 

FS approaches are data preprocessing methods widely 
applied in data mining applications involving 
classification or grouping. These methods preserve the 
essential discriminating data while providing a reduced 


set of input features [1]. Based on estimating criteria, FS 


and hybrid approaches. The statistical scoring metrics 
information gain [2], correlation [3] and relief [4] are the 
foundation of filter approaches. While subsets are 
selected using standard search methods in the wrapper 
method [5] [6], the calibre of the selected features is 
evaluated using learning algorithms. 

The inclusion of the filter and wrapper methods is 
made easier by the embedded approach weight vector of 
SVM, DT. The 


procedures after wrappers have interacted with the 


subset is chosen using filtering 


learning system. Two recombination strategies are 


frequently employed in the hybrid [7][8][9] method to 
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combine wrappers and filters. The wrapper strategy is 
employed after the filter approach as a pre-processing 
technique. Secondly, apply filtering or wrapper 
techniques to local search strategies. 

Researchers are interested in metaheuristic (MH) 
algorithms since past studies have shown them to have 
great potential in solving the FS problem [10]. The 
practice of merging local and random search techniques 
results in MH algorithms. This method uses a heuristic 
algorithm together with an intelligent combination of 
many concepts to explore and exploit the search space. 
The literature classifies MH algorithms into four types 
primary 
physics-based 


depending on the influences: 
[11], [12], 
techniques [13], and evolutionary algorithms [14]. 

The No-Free-Lunch that no 
optimization method is every 


application [15]. It is crucial to develop a novel strategy 


swarm 
intelligence human-based 
theorem states 


appropriate for 


or enhance existing optimization techniques through 

hybridization in order to solve a particular problem. 

The principal contributions of the EO-SCA approach 
are listed below: 

e <A hybrid approach that combines the Sine Cosine 
Algorithm (SCA) and Equilibrium Optimization 
Algorithm (EOA) to enhance exploration and get 
over the challenge of getting stuck in the EO 
algorithm's local optima. 

e 20 different medical datasets are used to evaluate the 
effectiveness of the suggested model. 

e We show EO-superiority SCA's over other methods 
by comparing it with many widely used and 
conventional feature selection techniques. 

The rest of the paper is organised as follows: Section 2 
gives an explanation of the Equilibrium Optimizer 
Technique. The EO-SCA method for feature selection is 
study's 


conclusions are provided in Section 4, while Section 5 


shown in Section 3. The experimental 


summarises the findings. 


2. EQUILIBRIUM OPTIMIZER (EO) 


called the 


equilibrium optimizer (EO) was presented by Faramarzi 


A unique metaheuristic approach 
et al. [16] in 2020. The approach seeks to determine the 
equilibrium state of a system by using the mass balance 


equation in a control volume as a motivation. 


Using random solutions in the search space, the initial 
concentrations of the EO algorithm Q; are produced in 


the initialization phase as follows: 


Qi = Qmin + randi (Qmax — Qmin) 
= 1.2) sig tt (1) 


The terms Qmax and Qmin indicate the particle's 
maximum and smallest size. The four particles along 
with the fifth contender Qem (avg) are used to generate an 


equilibrium pool vector. 


Qem pool = {Qem (1) Qem (2)) Qem BP Qem (4)) Qem (avg) }(2) 


Qema) + Qem(2) + Qem) + Qema) 
ema) = em em 7 em em (3) 


The exponential term (F) is the basic intensity 
updating rule where the constants, denoted by c, and c3, 


have respective values of 2 and 1. 


F =c,sign(7 - 0.5)[ey ~ — 1] (4) 
Itr (c2x Itr ) (5) 


) It max 


t=(1- 


I Cmax 


The exploitation phase is enhanced by the Generation 
Rate (R), which can be expressed as follows: 


where & is the decay constant and Ry represents the 
initial generation rate.rand, and rand, are the arbitrary 
numbers in the interval [0,1] and CP is the Generation 
Rate Control Parameter. 


=> (0.5rand,, rand, =P 
a f 0, rand, < P (8) 

The following equation defines the updating rule of 
the candidates by the EO algorithm: 


= Ë = 1-F)(9 
Q = Qem +(Q — Qem). Pap - F) (9) 
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3. PROPOSED METHOD 


Hybrid EO-SCA method 


The exploration phase of the EO method can present 
several challenges despite its efficiency. The integration 
of the EO algorithm with the Sine-Cosine approach 
(SCA) [17] in this algorithm improves particle movement 
by utilising the SCA approach. The proposed technique 
increases population variety and keeps the prediction 
model out of local optima, which enhances the current 
EOA's Additionally, the 


recommended strategy strikes a balance between the 


searching capabilities. 


effects of exploration and exploitation capabilities. The 


Eq. (9) is replaced by the equations given below: 


Q 


» AR 7 
Qem + sin(r1) * |r2 * Q -Qem |- F ae —F),rand( )< 0.5 


g R y 
Qem + cos(r1) * |r2 * Q -Qeml.F + aye — F), rand( ) 20.5 


(10) 


Here rl is defined as (2*pi*rand()) and r2 is an 


arbitrary number in [0,1]. 
Evaluation 


pdftotext It is critical to reduce the dimensionality of 
the data by eliminating unnecessary and irrelevant 
features and boosting a specific classifier's learning rate 
and accuracy. By raising the accuracy value, the 
classifier's performance can be improved while using 
fewer features by selecting the appropriate fitness 
function. Fitness is ascertained by the following objective 
function, which is indicated by 


L Fit fun = BCr+(1- M(E (11) 


where |F | is the number of features chosen in row I, 
Cr is the classifier's (KNN) classification error, and f is 


an arbitrary value between 0 and 1. 


4, EXPERIMENTAL STUDY 


Medical Datasets 


The Table 1 lists the 20 standard medical datasets that 
are used in the study gathered from Keel repository [18] 
and UCI data repository [19] to study and evaluate the 
performance of the EO-SCA method. 


TABLE 1: DATASETS WITH THEIR CHARACTERISTICS 


Dataset Instances Features Classes Dimension 
Appendicitis (D1) 106 7 2 
Breast Tissue (D2) 106 10 6 
Cleveland (D3) 297 13 5 
Coimbra (D4) 116 10 2 
E.coli (D7) 336 8 
Haberman (D8) 306 3 2 low < 20 
Heart Statlog (D9) 270 13 2 
Hepatitis (D10) 155 19 2 
ILPD (D11) 583 10 2 
Lymphography (D12) 148 18 2 
Mammographic (D13) 830 5 3 
New Thyroid (D15) 215 5 2 
Pima (D16) 768 8 2 
Wisconsin (D20) 569 3 3 
Dermatology (D6) 366 34 6 
Spectf Heart (D17) 267 44 4 Medium 
Thyroid (D18) 7200 21 3 (20-100) 
Colon (D5) 62 2000 2 
Leukemia (D12) 72 7070 2 High>100 
TOX_171 (D19) 171 5748 4 
Metrics 


Theefficacy of EO-SCA has been evaluated using the 
following four measures: 1) Best fitness value 2) Mean 
fitness value 3) Classification Accuracy 4) Average 
Feature size. This strategy is contrasted with five other 
well-known feature selection techniques 


Experimental Study 


Theproposed method EO-SCA is compared with five 


other standard wrapper-based feature selection 
methods. Each algorithm is run 10 times over each of the 
20 benchmark biomedical datasets and the best and 
mean fitness values, classification accuracies, feature 
sizes and running times are noted. Initially, we 
compared the EO-SCA algorithm with the EO algorithm 


and we summarized the obtained results in Table 2. 


TABLE 2: COMPARISON OF EO-SCA WITH EO ALGORITHM 


Mean 
BDS| Best fitness Mean fitness Classification Average 
value value Accuracy Feature Size 
EO -SCA EO-SCA EO EO 
EO EO | -SCA EO -SCA EO 
D1} 0.0140 0.048) 0.0811 0.0910] 91.89 90.85 1.70 1.75 
0 

D2! 0.2300 0.1460 | 0.2881 0.2969) 71.19 70.35 2.20 1.90 
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D3} 0.0701 0.1190 | 0.1259 0.1550| 87.41 84.50 4.70 4.10 
D4| 0.0497 0.0453 | 0.1542 0.1211| 84.58 87.89 3.11 3.30 
D5| 0.0000 0.0000} 0.0240 0.0329) 97.60 96.71 27.00 29.80 
D6| 0.0020 0.0020 | 0.0040 0.0045) 99.60 99.55 9.30 10.67 
D7| 0.0943 0.0944 | 0.1382 0.1298| 86.18 87.02 4.20 4.00 
D8| 0.1850 0.1850} 0.2350 0.2353) 76.50 76.47 1.89 1.89 
D9| 0.0764 0.0590 | 0.1239 0.1332| 87.61 86.67 3.10 4.00 
D10) 0.0354 0.0350 | 0.0624 0.0650| 93.76 93.50 2.40 2.80 
D11| 0.2162 0.2162 | 0.2400 0.2410) 76.00 75.90 2.90 2.10 
D12| 0.0000 0.0000 | 0.0002 0.0010| 99.98 99.90 69.00 90.33 
D13| 0.0720 0.0716 0.0559 0.0912| 94.41 90.88 4.70 4.70 
D14| 0.1051 0.1511 0.1663 0.1746} 83.37 82.54 2.50 2.10 
D15|/ 0.0060 0.0065 0.0317 0.0402| 96.83 95.98 2.30 2.20 
D16| 0.1830 0.1982 | 0.2325 0.2325| 77.75 76.75 3.50 3.60 
D17| 0.0391 0.0212 | 0.0622 0.0601| 93.78 93.99 8.10 9.90 
D18) 0.0122 0.0102 | 0.0145 0.0154| 98.55 98.46 4.00 4.20 
D19) 0.0297 0.0306 | 0.0461 0.0625| 95.39 93.75 641.2 768.4 
D20| 0.0187 0.0203 | 0.0413 0.0601| 95.87 93.99 3.10 3.00 


TABLE 4: BEST FITNESS VALUES OF TEN RUNS FOR ALL APPROACHES 


From Table 2, we observe that the hybrid EO-SCA 
obtained superior results compared to the EO algorithm. 
Using EO-SCA and EO 


classification accuracies across twenty datasets are 


algorithms, the average 
0.8941 and 0.8878, respectively. The average number of 
features used by EO-SCA has significantly decreased. 
Therefore, employing the suggested algorithm produces 


outcomes that are far better. 


We also compare the EO-SCA method with four other 
FS approaches: GA, SCA, PSO and GWO. The parameter 


values are shown in the Table 3. 


TABLE 3: PARAMETER VALUES 


Datase EO-SCA GA PSO SCA GWO 
t 
D1 0.0140 0.0476 0.0500 0.0500 0.0480 
D2 0.2300 0.2401 0.2311 0.2851 0.2381 
D3 0.0701 0.0702 0.1190 0.0901 0.1000 
D4 0.0497 0.0435 0.1335 0.1321 0.0435 
D5 0.0000 0.0000 0.0042 0.0000 0.0001 
D6 0.0020 0.0000 0.0025 0.0021 0.0000 
D7 0.0943 0.0746 0.1124 0.0960 0.1194 
D8 0.1850 0.1852 0.1852 0.2016 0.2131 
D9 0.0764 0.0926 0.0937 0.0560 0.0556 
D10 0.0354 0.0370 0.0401 0.0364 0.0357 
D11 0.2162 0.2155 0.2334 0.2331 0.1983 
D12 0.0000 0.0011 0.0047 0.0000 0.0000 
D13 0.0720 0.0345 0.0369 0.0692 0.0345 
D14 0.1051 0.1145 0.1173 0.1412 0.1325 
D15 0.0060 0.0000 0.0040 0.0251 0.0050 
D16 0.1830 0.1835 0.2064 0.1850 0.1961 
D17 0.0391 0.0189 0.0234 0.0392 0.0037 
D18 0.0122 0.0123 0.0123 0.0083 0.0089 
D19 0.0297 0.0588 0.0634 0.0317 0.0000 
D20 0.0187 0.0265 0.0253 0.0351 0.0088 


Algorithms Parameter Values 
GA Population=10, evaluations=100, MR = 0.01 and CR = 0.8. 
PSO Population=10, evaluations=100, w=0.9 and (ci and c=2). 
SCA Population=10, evaluations=100. 
GWO Population=10, evaluations=100. 
EO Population=10, evaluations=100, ci=2, c2=1, P=0.5 and v=1. 
EO-SCA Population=10, evaluations=100, ci=2, c2=1, P=0.5 and v=1. 


Table 6 depicts the ten runs’ mean classification 
accuracies.The hybrid EO-SCA technique has obtained 
the optimal classification accuracy values for 10 datasets. 
This method produced an average accuracy of 89.41 
percent, with EO coming in second with an average 
accuracy of 88.48 percent for 20 datasets. GA algorithm 
obtained the worst accuracy of 87.34% among the six 
algorithms. EO-SCA algorithms are highly effective 
when applied to high dimensional datasets (TOX_171 
and Colon) with an average accuracy of 95.39% and 
97.60% respectively. 


TABLE 5: MEAN FITNESS VALUES OF TEN RUNS FOR ALL APPROACHES 


Table 4 and Table 5 demonstrate the best fitness values 
and mean fitness values for ten runs of five algorithms. It 
is found that EO-SCA method obtained the optimal best 
fitness values for 10 datasets and optimal mean fitness 
values for 11 datasets followed by GWO for 8 datasets 
and 3 datasets respectively. 


Datase EO-SCA GA PSO SCA GWO 
t 
D1 0.0811 0.0850 0.0905 0.0714 0.0667 
D2 0.2881 0.3035 0.2952 0.3232 0.3333 
D3 0.1259 0.1604 0.1317 0.1289 0.1517 
D4 0.1542 0.1571 0.1739 0.1631 0.1217 
D5 0.0240 0.1144 0.0833 0.0328 0.0417 
D6 0.0040 0.0113 0.0027 0.0133 0.0068 
D7 0.1382 0.1400 0.1448 0.1267 0.1388 
D8 0.2350 0.2438 0.2197 0.2410 0.2475 
D9 0.1239 0.1347 0.0926 0.1370 0.1222 
D10 0.0624 0.0773 0.0714 0.0871 0.0643 
D11 0.2400 0.2522 0.2500 0.2484 0.2474 
D12 0.0002 0.0652 0.0143 0.0005 0.0214 
D13 0.0559 0.0678 0.0828 0.0750 0.0897 
D14 0.1663 0.1590 0.1664 0.1672 0.1542 
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D15 0.0317 0.0278 0.0301 0.0411 0.0302 D12 69.00 2999 3357 70.01 757.5 


D16 0.2325 0.2334 0.2229 0.2326 0.2261 D19 le bat a ane S10 
D17 0.0622 0.0647 0.0717 0.0809 0.0660 pI IR 2 a peli a 
D15 2.30 2.10 2.37 1.87 2.40 
D18 0.0145 0.0147 0.0120 0.0449 0.0137 Dit n a i o a 
D19 0.0461 0.1664 0.1059 0.0885 0.0794 ae an a ise K Da 
D20 0.0413 0.0532 0.0407 0.0418 0.0372 Dig a ne Le oe fe 
D19 641.22 2521.2 27443 641.40 1202 
D20 3.10 3.50 6.67 2.67 3.10 


TABLE 6: MEAN CLASSIFICATIONACCURACIES OF TEN RUNS FOR ALL 


APPROACHES 
Dataset EO-SC GA PSO SCA GWO 
A 
D1 91.89 91.50 90.95 92.86 93.33 
D2 71.19 69.65 70.48 67.68 66.67 s 
D3 87.41 83.96 86.83 87.11 84.83 3 
D4 84.58 84.29 82.61 83.69 87.83 z 
D5 97.60 88.56 91.67 96.72 95.83 2 
D6 99.60 98.87 99.73 98.67 99.32 = 
D7 86.18 86.00 85.52 87.33 86.12 8 
D8 76.50 75.62 78.03 75.90 75.25 
D9 87.61 86.53 87.04 86.30 87.78 
D10 93.76 92.27 92.86 91.29 93.57 
D11 76.00 74.78 75.00 75.16 75.26 mEO-SCA MEO mGA mPSO ESCA mGWO 
D12 99.98 93.48 98.57 99.95 97.86 
D13 94.41 93.22 91.72 92.50 91.03 
D14 83.37 84.10 83.36 83.28 84.58 Fig. 1. Classification results of all techniques over 20 datasets 
D15 96.83 97.23 96.97 95.89 96.98 
D16 77.75 76.66 77), 76.74 77.39 
D17 93.78 93.53 92.83 91.91 93.40 90 
D18 98.55 98.53 98.78 95.51 98.63 89.5 
D19 95.39 83.36 89.41 91.15 92.06 89 
D20 95.87 94.68 95.93 95.82 96.28 88.5 
Average 89.41 87.34 88.30 88.27 88.78 


Table 7 depicts the average feature size results of 10 runs. 
It is seen that the EO-SCA approach utilizes small no. of 
features when compared to the other feature selection 
approaches. Colon and TOX_171 are the two datasets of 
high dimensions with 2000 and 5748 number of features 
respectively. From the table, it is seen that EO-SCA uses 


Mean Classification Accuracy 


only 27 and 641.22 no. of features for 10 runs, hence 


Fig. 2. Mean Classification Accuracies of all approaches 


proving to be efficient compared to the other algorithms. 


TABLE 8: FRIEDMAN’S AND HOLM’S TEST 
TABLE 7: AVERAGE FEATURE SIZE OF TEN RUNS FOR ALL APPROACHES 


Dataset EO-SCA GA PSO SCA Gwo i Algorithms Friedman mean Holm a/(k-i) 
rankings p-value 
D1 1.70 1.80 1.80 1.60 1.90 
D2 2.20 2.80 2.40 2.40 2.30 h ca TON sabes om 
D3 4.70 5.10 4.87 4.90 4.60 2 siete 4209 TARO oe 
D4 3.11 3.70 3.77 2.75 3.90 2 O 2:650 0AE 00166 
D5 27.00 712.70 882.77 28.20 185.40 4 PSO 390 6178B0 0.025 
D6 9.30 11.40 12.12 9.80 10.20 3 ewo SA 00e a“ 
D7 4.20 4.50 5.00 4.25 4.70 POSCA 1230 
D8 1.89 1.70 1.70 2.72 1.50 
D9 3.10 5.70 5.44 3.70 4.00 Friedman p-value : < 10E-10 
D10 2.40 2.80 4.12 2.43 4.00 
D11 2.90 2.80 3.75 2.40 3.20 
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Additionally, statistical results have been included in 
Table 9. Firstly, the Friedman’s test has been used to 
compare all the algorithms. The Friedman ranks are 
evaluated for each of the algorithms and a Friedman 
p-value is evaluated which is less than 10E-10. There is a 
significant difference among all the approaches as p << 
0.05. 


Next, we performed Holm’s test to compare the 
EO-SCA with the other algorithms. The Holm p-values 
are calculated for each of the individual algorithms and 
it is found that the p-values obtained by each of the 
algorithms is less than (a/(k-i)) value. Hence, there is a 
significant difference between the EO-SCA and each of 
the five other individual algorithms. On the whole, the 
proposed approach (EO-SCA) outperformed other 
algorithms not only in terms of classification accuracies 
but also in obtaining an optimal feature subset with less 


no. of features. 


5. CONCLUSION 

In this work, a new FS method called EO-SCA is created 
to address the FS issue in medical data classification. The 
EO-SCA 
Sine-Cosine algorithm to improve the EO's exploration 


incorporates randomization using the 


and exploitation. EO-SCA produced an average accuracy 
of 89.41%. Through comparisons with GA, SCA, PSO, 
GWO and traditional EO, the effectiveness of EO-SCA 
has been shown. The proposed model avoids model 


stagnation at the local optimum and improves the 


exploration capacity. 
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